Purpose

Using differentially expressed genes from pre/post exercise human muscle, SignatureSearch was used to find drugs of similar perturbation signatures as possible candidates for Cortes R01 prelim data analysis for “exercise mimetics” for Alzheimers Disease.
Dataset:
GitHub:
System which operations were done on: MacBook Pro (16-inch, 2019), Processor 2.4 GHz 8-Core Intel Core i9, Memory 64 GB 2667 MHz DDR4, Graphics AMD Radeon Pro 5300M 4 GB Intel UHD Graphics 630 1536 MB

Contents - CMAP Method
- LINCS Method

#reference database
eh <- ExperimentHub()
## snapshotDate(): 2020-10-27
cmap <- eh[["EH3223"]]; cmap_expr <- eh[["EH3224"]]
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache

Load in data

#DESeq2 results
human_deseq2_res <- read.csv("210518_Human_DESeq_Res.csv", header = TRUE, row.names = 1)

#LFC non NA genes
human_LFC <- human_deseq2_res[!is.na(human_deseq2_res$log2FoldChange),]
#Get rid of specific transcript .X numbers 
rownames(human_LFC) <- sub("\\..*", "", rownames(human_LFC))

Find up and down genes

#showing up empty for LFC > 2 and p-adj < 0.05, so changed LFC to 1.5
human_up_df <- human_LFC[human_LFC$log2FoldChange >1 & human_LFC$padj <0.05,]
human_up_list <- rownames(human_up_df)
# Jen used mapIds but it sounds like it could be the same as select(), but mapIds gives error: Error in .processFilterParam(keys, x) : 'filter' has to be an 'AnnotationFilter', a list of 'AnnotationFilter' object, an 'AnnotationFilterList' or a valid filter expression!
human_up_list <- mapIds(EnsDb.Hsapiens.v75, keys = human_up_list, column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 138 of 252 requested IDs.
#correlation might be better method because the LFC?
human_down_df <- human_LFC[human_LFC$log2FoldChange < -1 & human_LFC$padj <0.05,]
human_down_list <- rownames(human_down_df)
human_down_list <- mapIds(EnsDb.Hsapiens.v75, keys= human_down_list , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 42 of 54 requested IDs.

CMAP

(STRAIGHT FROM JEN’S MARKDOWN )Lamb et al. (2006) introduced the gene expression-based search method known as Connectivity Map (CMap) where a GES database is searched with a query GES for similar entries (Lamb et al. 2006). Specifically, the GESS method from Lamb et al. (2006), here termed as CMAP, uses as query the two label sets of the most up- and down-regulated genes from a genome-wide expression experiment, while the reference database is composed of rank transformed expression profiles (e.g. ranks of LFC or z-scores). The actual GESS algorithm is based on a vectorized rank difference calculation. The resulting Connectivity Score expresses to what degree the query up/down gene sets are enriched on the top and bottom of the database entries, respectively. The search results are a list of perturbagens such as drugs that induce similar or opposing GESs as the query. Similar GESs suggest similar physiological effects of the corresponding perturbagens.
Function qSig() builds an object to store the query signature, reference database and GESS method used for GESS methods

qsig_cmap_human <- qSig(query = list(upset=as.character(human_up_list), downset=as.character(human_down_list)), 
                  gess_method="CMAP", refdb= "cmap")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database
cmap <- gess_cmap(qSig= qsig_cmap_human, chunk_size=5000)
cmap_res <- result(cmap)[1:50,]
write.csv2(cmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/output/SignatureSearch/210518_cmap_res_human_revDESeq2contrasts.csv")
cmap_res
## # A tibble: 50 x 10
##    pert        PCID   cell  type  trend raw_score scaled_score N_upset N_downset
##    <chr>       <chr>  <chr> <chr> <chr>     <dbl>        <dbl>   <int>     <int>
##  1 SB-202190   53539… MCF7  trt_… up        0.859        1          59         6
##  2 pilocarpine 5909   PC3   trt_… down     -0.808       -1          59         6
##  3 doxorubicin 31703  PC3   trt_… down     -0.797       -0.987      59         6
##  4 tropine     <NA>   MCF7  trt_… down     -0.786       -0.974      59         6
##  5 etanidazole 3276   HL60  trt_… down     -0.780       -0.965      59         6
##  6 dirithromy… 64738… MCF7  trt_… down     -0.779       -0.964      59         6
##  7 nimesulide  4495   MCF7  trt_… down     -0.763       -0.944      59         6
##  8 scopolamin… <NA>   MCF7  trt_… down     -0.760       -0.941      59         6
##  9 N-acetyl-L… <NA>   HL60  trt_… down     -0.754       -0.933      59         6
## 10 adrenoster… 69273… HL60  trt_… down     -0.750       -0.928      59         6
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>
#JEN : This table contains the search results for each perturbagen (here drugs) in the reference database ranked by their signature similarity to the query. For the CMAP method, the similarity metrics are raw_score and scaled_score. The raw score represents the bi-directional enrichment score (Kolmogorov-Smirnov statistic) for a given up/down query signature. Under the scaled_score column, the raw_score has been scaled to values from 1 to -1 by dividing positive scores and negative scores with the maximum positive score and the absolute value of the minimum negative score, respectively. The remaining columns in the search result table contain the following information. pert: name of perturbagen (e.g. drug) in the reference database; cell: acronym of cell type; type: perturbation type, e.g. compound treatment is trt_cp; trend: up or down when reference signature is positively or negatively connected with the query signature, respectively; N_upset or N_downset: number of genes in the query up or down sets, respectively; t_gn_sym: gene symbols of the corresponding drug targets.

Visualize GESS Results

#drugs_top15 <- unique(cmap_res$pert)[1:15]
drugs_top30 <- unique(result(cmap)$pert)[1:30]
gess_res_vis(cmap_res, drugs = drugs_top30, col = "scaled_score")

TSEA
The following introduces how to perform TSEA on drug-based GESS results using as functional annotation systems GO, KEGG and Reactome pathways. … The specialized enrichment algorithms include Duplication Adjusted Hypergeometric Test (dup_hyperG), Modified Gene Set Enrichment Analysis (mGSEA) and MeanAbs (mabs).
Internally, the latter converts the drug set to a target set, and then computes for it enrichment scores for each MF GO term based on the hypergeometric distribution. The enrichment results are stored in a feaResult object. It contains the organism information of the annotation system, and the ontology type of the GO annotation system. If the annotation system is KEGG, the latter will be “KEGG”. The object also stores the input drugs used for the enrichment test, as well as their target information.

tsea_kegg_cmap_human <- tsea_dup_hyperG(drugs = drugs_top30[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs: 
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine
## Loading required package: org.Hs.eg.db
## 
## Reading KEGG annotation online:
## 
## Reading KEGG annotation online:
result(tsea_kegg_cmap_human)
## # A tibble: 192 x 9
##    ID     Description  GeneRatio BgRatio   pvalue p.adjust  qvalue itemID  Count
##    <chr>  <chr>        <chr>     <chr>      <dbl>    <dbl>   <dbl> <chr>   <int>
##  1 hsa00… Nitrogen me… 6/53      17/8112 6.81e-10  1.41e-7 6.09e-8 759/23…     6
##  2 hsa04… Adrenergic … 10/53     150/81… 3.42e- 8  3.53e-6 1.53e-6 207/56…    10
##  3 hsa04… Growth horm… 9/53      119/81… 5.98e- 8  4.13e-6 1.78e-6 207/29…     9
##  4 hsa05… Lipid and a… 11/53     215/81… 1.01e- 7  4.16e-6 1.80e-6 207/29…    11
##  5 hsa00… Steroid hor… 7/53      61/8112 1.12e- 7  4.16e-6 1.80e-6 1571/1…     7
##  6 hsa04… Relaxin sig… 9/53      129/81… 1.21e- 7  4.16e-6 1.80e-6 207/56…     9
##  7 hsa05… Yersinia in… 9/53      137/81… 2.03e- 7  5.28e-6 2.28e-6 207/29…     9
##  8 hsa04… AGE-RAGE si… 8/53      100/81… 2.26e- 7  5.28e-6 2.28e-6 207/56…     8
##  9 hsa04… Fc epsilon … 7/53      68/8112 2.41e- 7  5.28e-6 2.28e-6 207/24…     7
## 10 hsa04… MAPK signal… 12/53     294/81… 2.89e- 7  5.28e-6 2.28e-6 207/56…    12
## # … with 182 more rows
cmap_res_down <- cmap_res[cmap_res$trend =="down",]
cmap_res_up <- cmap_res[cmap_res$trend =="up",]

down_drug_cmap <- unique(cmap_res_down$pert)[1:20]
up_drug_cmap <- unique(cmap_res_up$pert)[1:20]

gess_res_vis(cmap_res, drugs = up_drug_cmap, col = "scaled_score")

dtnetplot(drugs = drugs(tsea_kegg_cmap_human), set = "hsa00140", 
          desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs: 
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine
#u).

DSEA
Instead of translating ranked lists of drugs into target sets, as for TSEA, the functional annotation categories of the targets can be assigned to the drugs directly to perform Drug Set Enrichment Analysis (DSEA) instead. Since the drug lists from GESS results are usually unique, this strategy overcomes the duplication problem of the TSEA approach. This way classical enrichment methods, such as GSEA or tests based on the hypergeometric distribution, can be readily applied without major modifications to the underlying statistical methods. As explained above, TSEA and DSEA performed with the same enrichment statistics are not expected to generate identical results. Rather they often complement each other’s strengths and weaknesses.

dsea_res_cmap <- dsea_hyperG(drugs = drugs_top30[1:20], type = "KEGG", 
                            pvalueCutoff = 0.05, qvalueCutoff = 0.05, 
                            minGSSize = 10, maxGSSize = 2000)
result(dsea_res_cmap)
## # A tibble: 3 x 9
##   ID     Description   GeneRatio BgRatio  pvalue p.adjust qvalue itemID    Count
##   <chr>  <chr>         <chr>     <chr>     <dbl>    <dbl>  <dbl> <chr>     <int>
## 1 hsa05… Chemical car… 4/10      732/20… 2.85e-4   0.0424 0.0343 pilocarp…     4
## 2 hsa00… Metabolism o… 4/10      869/20… 5.49e-4   0.0424 0.0343 pilocarp…     4
## 3 hsa00… Steroid horm… 4/10      906/20… 6.43e-4   0.0424 0.0343 pilocarp…     4
write.csv2(result(dsea_res_cmap), file =  "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs//210526_dsea_cmap_res_human.csv")
#five top pathways KEGG upregulated or downregulated, KEGG usually has maps that relate to pathways, but some geens are linked to multiple pathways 
#Are there genes that are linked to linked to multiple KEGG pathways?
dtnetplot(drugs = drugs(dsea_res_cmap), set = "hsa00140", 
          desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs: 
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine

LINCS Method
Subramanian et al. (2017) introduced a more complex GESS algorithm, here referred to as LINCS. While related to CMAP, there are several important differences among the two approaches. First, LINCS weights the query genes based on the corresponding differential expression scores of the GEPs in the reference database (e.g. LFC or z-scores). Thus, the reference database used by LINCS needs to store the actual score values rather than their ranks. Another relevant difference is that the LINCS algorithm uses a bi-directional weighted Kolmogorov-Smirnov enrichment statistic (ES) as similarity metric.

qsig_lincs_human <- qSig(query =list(upset=as.character(human_up_list), downset=as.character(human_down_list)), 
                   gess_method="LINCS", refdb="lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database

The similarity scores stored in the LINCS result table are summarized here. WTCS: Weighted Connectivity Score; WTCS_Pval: nominal p-value of WTCS; WTCS_FDR: false discovery rate of WTCS_Pval; NCS: normalized connectivity score; NCSct: NCS summarized across cell types; Tau: enrichment score standardized for a given database. The latter is only included in the result table if tau=TRUE in a gess_lincs function call. The example given is run with tau=FALSE, because the tau values are only meaningful when the complete LINCS database is used which is not the case for the toy database.

The following provides a more detailed description of the similarity scores computed by the LINCS method. Additional details are available in the Supplementary Material Section of the Subramanian et al. (2017) paper.

WTCS: The Weighted Connectivity Score is a bi-directional ES for an up/down query set. If the ES values of an up set and a down set are of different signs, then WTCS is (ESup-ESdown)/2, otherwise, it is 0. WTCS values range from -1 to 1. They are positive or negative for signatures that are positively or inversely related, respectively, and close to zero for signatures that are unrelated.

WTCS_Pval and WTCS_FDR: The nominal p-value of the WTCS and the corresponding false discovery rate (FDR) are computed by comparing the WTCS against a null distribution of WTCS values obtained from a large number of random queries (e.g. 1000).

NCS: To make connectivity scores comparable across cell types and perturbation types, the scores are normalized. Given a vector of WTCS values w resulting from a query, the values are normalized within each cell line c and perturbagen type t to obtain the Normalized Connectivity Score (NCS) by dividing the WTCS value by the signed mean of the WTCS values within the subset of signatures in the reference database corresponding to c and t.

NCSct: The NCS is summarized across cell types as follows. Given a vector of NCS values for perturbagen p, relative to query q, across all cell lines c in which p was profiled, a cell-summarized connectivity score is obtained using a maximum quantile statistic. It compares the 67 and 33 quantiles of NCSp,c and retains whichever is of higher absolute magnitude.

Tau: The standardized score Tau compares an observed NCS to a large set of NCS values that have been pre-computed for a specific reference database. The query results are scored with Tau as a standardized measure ranging from 100 to -100. A Tau of 90 indicates that only 10% of reference perturbations exhibit stronger connectivity to the query. This way one can make more meaningful comparisons across query results.

lincs_human <- gess_lincs(qsig_lincs_human, sortby="NCS", tau=TRUE)
result(lincs_human)[1:50,]
## # A tibble: 50 x 15
##    pert         PCID    cell  type  trend   WTCS WTCS_Pval WTCS_FDR   NCS    Tau
##    <chr>        <chr>   <chr> <chr> <chr>  <dbl>     <dbl>    <dbl> <dbl>  <dbl>
##  1 BRD-K863366… 546541… PHH   trt_… down  -0.617 0.0000167  9.53e-5 -1.77  -99.9
##  2 BRD-K904461… 445011… PC3   trt_… up     0.641 0.0000156  9.53e-5  1.75   99.4
##  3 PD-173074    1401    PHH   trt_… up     0.619 0.0000166  9.53e-5  1.73   99.3
##  4 garcinol     737075… NPC   trt_… down  -0.609 0.0000170  9.53e-5 -1.73  -99.5
##  5 GW-3965      447905  SKB   trt_… up     0.636 0.0000158  9.53e-5  1.73   99.4
##  6 BRD-A391216… 2793096 MCF7  trt_… down  -0.627 0.0000162  9.53e-5 -1.71  -99.6
##  7 ASN-05257430 650361  A549  trt_… down  -0.608 0.0000171  9.53e-5 -1.71  -99.9
##  8 thiethylper… 5440    MCF7  trt_… down  -0.616 0.0000167  9.53e-5 -1.68 -100. 
##  9 geranylgera… 5281365 MCF7  trt_… down  -0.616 0.0000167  9.53e-5 -1.68  -99.7
## 10 BRD-A741349… 3950572 HA1E  trt_… down  -0.632 0.0000160  9.53e-5 -1.68  -99.8
## # … with 40 more rows, and 5 more variables: TauRefSize <int>, NCSct <dbl>,
## #   N_upset <int>, N_downset <int>, t_gn_sym <chr>
lincs_res <- result(lincs_human)
write.csv(lincs_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_human_res.csv")
drugs_lincs_top <- unique(lincs_res$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_top, col = "NCS")

lincs_res_down <- lincs_res[lincs_res$trend == "down",]
down_drug_lincs <- unique(lincs_res_down$pert)[1:20]
lincs_res_up  <- lincs_res[lincs_res$trend == "up",]
up_drug_lincs <- unique(lincs_res_up$pert)[1:20]

drugs_lincs_topdown <- unique(lincs_res_down$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topdown , col = "NCS")

tsea_lincs_down <- tsea_dup_hyperG(drugs = drugs_lincs_topdown[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 12 drugs: 
## brd-k86336638 / garcinol / brd-a39121618 / asn-05257430 / geranylgeraniol / brd-a74134989 / brd-k31302860 / rhapontin / brd-k93410934 / brd-k36591038 / brd-a02241946 / sinensetin
result(tsea_lincs_down)
## # A tibble: 50 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa05… Nicotine a… 22/53     40/8112 5.67e-40 5.27e-38 4.12e-38 2555/2…    22
##  2 hsa04… Neuroactiv… 26/53     341/81… 2.22e-22 1.03e-20 8.07e-21 1812/1…    26
##  3 hsa05… Morphine a… 16/53     91/8112 1.69e-19 5.23e-18 4.09e-18 1812/2…    16
##  4 hsa04… GABAergic … 15/53     89/8112 5.25e-18 1.22e-16 9.54e-17 2555/2…    15
##  5 hsa04… Retrograde… 16/53     148/81… 5.52e-16 1.03e-14 8.02e-15 2555/2…    16
##  6 hsa04… Taste tran… 10/53     86/8112 1.41e-10 2.19e- 9 1.71e- 9 2555/2…    10
##  7 hsa05… Cocaine ad… 8/53      49/8112 7.04e-10 9.35e- 9 7.30e- 9 1812/2…     8
##  8 hsa05… Amphetamin… 8/53      69/8112 1.18e- 8 1.37e- 7 1.07e- 7 1812/2…     8
##  9 hsa04… Glutamater… 8/53      114/81… 6.22e- 7 6.43e- 6 5.02e- 6 2902/2…     8
## 10 hsa04… cAMP signa… 10/53     216/81… 1.04e- 6 9.69e- 6 7.57e- 6 1812/2…    10
## # … with 40 more rows
write.csv2(result(tsea_lincs_down), file =  "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_tsea.csv")
dsea_res_lincs <- dsea_hyperG(drugs = drugs_lincs_topdown, type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(dsea_res_lincs)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## #   BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## #   Count <int>
#0 rows
drugs_lincs_topup <- unique(lincs_res_up$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topup , col = "NCS")

tsea_lincs_up <- tsea_dup_hyperG(drugs = drugs_lincs_topup[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs: 
## brd-k90446118 / badge / salicin / brd-a93015352 / brd-k91047982 / brd-k33928422 / brd-k31412180 / dibutyrylcyclic-gmp / brd-k79779816 / brd-k42687792 / hydrocotarnine / tetroquinone / brd-k08198779
result(tsea_lincs_up)
## # A tibble: 114 x 9
##    ID     Description  GeneRatio BgRatio  pvalue p.adjust  qvalue itemID   Count
##    <chr>  <chr>        <chr>     <chr>     <dbl>    <dbl>   <dbl> <chr>    <int>
##  1 hsa04… Calcium sig… 12/47     240/81… 7.14e-9  1.20e-6 9.09e-7 2263/22…    12
##  2 hsa04… Rap1 signal… 10/47     210/81… 2.45e-7  2.06e-5 1.56e-5 2263/22…    10
##  3 hsa04… PI3K-Akt si… 11/47     354/81… 3.96e-6  1.62e-4 1.23e-4 2263/22…    11
##  4 hsa04… MAPK signal… 10/47     294/81… 5.28e-6  1.62e-4 1.23e-4 2263/22…    10
##  5 hsa01… EGFR tyrosi… 6/47      79/8112 5.50e-6  1.62e-4 1.23e-4 2263/22…     6
##  6 hsa04… Ras signali… 9/47      232/81… 5.79e-6  1.62e-4 1.23e-4 2263/22…     9
##  7 hsa05… Central car… 5/47      70/8112 4.79e-5  1.08e-3 8.18e-4 2263/22…     5
##  8 hsa04… Adherens ju… 5/47      71/8112 5.14e-5  1.08e-3 8.18e-4 60/1457…     5
##  9 hsa04… Focal adhes… 7/47      201/81… 1.40e-4  2.62e-3 1.98e-3 2321/23…     7
## 10 hsa05… Prostate ca… 5/47      97/8112 2.27e-4  3.54e-3 2.68e-3 2263/51…     5
## # … with 104 more rows
write.csv2(result(tsea_lincs_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_up_tsea.csv" )
dsea_lincs_down <- dsea_hyperG(drugs = drugs_lincs_topdown, type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(dsea_lincs_down)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## #   BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## #   Count <int>

GCMAP
The Bioconductor gCMAP (Sandmann et al. 2014) package provides access to a related but not identical implementation of the original CMAP algorithm proposed by Lamb et al. (2006). It uses as query a rank transformed GEP and the reference database is composed of the labels of up and down regulated DEG sets. This is the opposite situation of the CMAP method, where the query is composed of the labels of up and down regulated DEGs and the database contains rank transformed GESs.
First, create matrix with zscores

#changed LFC > 2 to 1.5, too few genes otherwise
DESEQ2_sig_df <- human_LFC[abs(as.numeric(human_LFC$log2FoldChange)) >1.5 & human_LFC$padj <0.05,]
LOG <- as.numeric(DESEQ2_sig_df$log2FoldChange)

IDS <- mapIds(EnsDb.Hsapiens.v75, keys= rownames(DESEQ2_sig_df) , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 40 of 126 requested IDs.
names(IDS) <- NULL
LOG_V2 <- LOG[!is.na(IDS)]
IDS_V2 <-IDS[!is.na(IDS)]
duplicated_genes <- unique(IDS_V2[duplicated(IDS_V2)])
names(LOG_V2) <- IDS_V2

#fix<- c()
#for (i in 1:length(duplicated_genes)){
#  group <- LOG_V2[names(LOG_V2) == duplicated_genes[i]]
# top <- group[group == max(abs(group))]
# fix <- c(fix, top)
#}
#fix
#LOG_V3 <- LOG_V2[!names(LOG_V2) %in% duplicated_genes]
#LOG_V4 <- c(LOG_V3, fix)
#LOG_V5 <- as.data.frame(LOG_V4)
#rownames(LOG_V5) <- names(LOG_V4)

#duplicated_genes is zero 
LOG_V5 <- as.data.frame(LOG_V2)
rownames(LOG_V5) <- names(LOG_V2)
qsig_gcmap <- qSig(query = as.matrix(LOG_V5), gess_method = "gCMAP", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
gcmap <- gess_gcmap(qsig_gcmap, higher = 1, lower = -1)
result(gcmap)
## # A tibble: 45,956 x 10
##    pert     PCID   cell   type  trend effect  nSet nFound signed t_gn_sym       
##    <chr>    <chr>  <chr>  <chr> <chr>  <dbl> <dbl>  <dbl> <lgl>  <chr>          
##  1 capsaze… 27334… HCC515 trt_… up     1       777      2 TRUE   TRPV1; TRPV4   
##  2 BRD-K95… 53107… A549   trt_… down  -1       936      2 TRUE   <NA>           
##  3 hydroco… 3646   HT29   trt_… down  -0.994   988      2 TRUE   <NA>           
##  4 ZM-39923 3797   PC3    trt_… down  -0.994   458      2 TRUE   <NA>           
##  5 BRD-K65… 28571… NEU    trt_… down  -0.994  1547      2 TRUE   <NA>           
##  6 RU-24969 108029 ASC    trt_… up     0.994  1278      2 TRUE   HTR1B; HTR1D; …
##  7 BRD-A31… 29988… PC3    trt_… up     0.994   525      2 TRUE   <NA>           
##  8 gingerol 442793 NEU    trt_… down  -0.994   590      2 TRUE   <NA>           
##  9 austric… 64199… HA1E   trt_… up     0.988  1791      3 TRUE   <NA>           
## 10 BRD-K58… 31076… VCAP   trt_… up     0.988   446      2 TRUE   <NA>           
## # … with 45,946 more rows

The columns in the corresponding search result table, that are specific to the gCMAP method, contain the following information. effect: scaled bi-directional enrichment score corresponding to the scaled_score under the CMAP result; nSet: number of genes in the reference gene sets after applying the higher and lower cutoff; nFound: number of genes in the reference gene sets that are present in the query signature; signed: whether the gene sets in the reference database have signs, e.g. representing up and down regulated genes when computing scores.

gcmap_res <- result(gcmap)
write.csv2(gcmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_res_human.csv")
drugs_top20_gcmap <- c(unique(gcmap_res$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")

gcmap_res_down <- gcmap_res[gcmap_res$trend == "down",]
down_drug_gcmap <- unique(gcmap_res_down$pert)[1:20]
gcmap_res_up <- gcmap_res[gcmap_res$trend == "up",]
up_drug_gcmap <- unique(gcmap_res_up$pert)[1:20]
drugs_top20 <- c(unique(gcmap_res_down$pert)[1:20], "temozolomide")
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")

TSEA for gCMAP

dup_rct_res_gcmap_down <- tsea_dup_hyperG(drugs = drugs_top20_gcmap[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs: 
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / brd-a31429689 / gingerol / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k42867405 / brd-k02641134 / mw-stk33-3a
result(dup_rct_res_gcmap_down)
## # A tibble: 27 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa04… Neuroactiv… 11/15     341/81… 7.26e-13 1.96e-11 1.22e-11 7442/3…    11
##  2 hsa04… Serotonerg… 5/15      115/81… 1.41e- 6 1.90e- 5 1.18e- 5 3351/3…     5
##  3 hsa04… Calcium si… 6/15      240/81… 2.52e- 6 2.27e- 5 1.42e- 5 3357/3…     6
##  4 hsa04… cAMP signa… 4/15      216/81… 5.30e- 4 3.58e- 3 2.23e- 3 3351/3…     4
##  5 hsa04… Inflammato… 3/15      98/8112 7.00e- 4 3.78e- 3 2.36e- 3 7442/5…     3
##  6 hsa00… Arachidoni… 2/15      61/8112 5.48e- 3 2.47e- 2 1.54e- 2 6916/5…     2
##  7 hsa04… Gastric ac… 2/15      76/8112 8.41e- 3 3.24e- 2 2.02e- 2 887/25…     2
##  8 hsa04… Taste tran… 2/15      86/8112 1.07e- 2 3.34e- 2 2.09e- 2 3351/3…     2
##  9 hsa04… Gap juncti… 2/15      88/8112 1.11e- 2 3.34e- 2 2.09e- 2 3357/1…     2
## 10 hsa04… Pancreatic… 2/15      102/81… 1.48e- 2 3.99e- 2 2.49e- 2 5319/8…     2
## # … with 17 more rows
write.csv2(result(dup_rct_res_gcmap_down), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_down_human.csv")
drugs_top20_gcmap_up <- c(unique(gcmap_res_up$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap_up, col = "effect")

dup_rct_res_gcmap_up <- tsea_dup_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs: 
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091
result(dup_rct_res_gcmap_up)
## # A tibble: 56 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa04… Adrenergic… 15/38     150/81… 5.34e-17 3.21e-15 1.29e-15 148/14…    15
##  2 hsa05… Arrhythmog… 12/38     77/8112 4.88e-16 1.46e-14 5.90e-15 775/77…    12
##  3 hsa04… Cardiac mu… 12/38     87/8112 2.28e-15 3.50e-14 1.41e-14 775/77…    12
##  4 hsa04… Oxytocin s… 14/38     154/81… 2.84e-15 3.50e-14 1.41e-14 775/77…    14
##  5 hsa04… MAPK signa… 17/38     294/81… 2.94e-15 3.50e-14 1.41e-14 773/77…    17
##  6 hsa05… Hypertroph… 12/38     90/8112 3.50e-15 3.50e-14 1.41e-14 775/77…    12
##  7 hsa05… Dilated ca… 12/38     96/8112 7.83e-15 6.71e-14 2.71e-14 775/77…    12
##  8 hsa04… Calcium si… 15/38     240/81… 6.36e-14 4.77e-13 1.93e-13 3357/3…    15
##  9 hsa04… Serotonerg… 11/38     115/81… 2.50e-12 1.66e-11 6.72e-12 3351/3…    11
## 10 hsa04… GnRH secre… 8/38      64/8112 3.88e-10 2.33e- 9 9.40e-10 775/77…     8
## # … with 46 more rows
write.csv2(result(dup_rct_res_gcmap_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_up_human.csv")
hyperG_k_res_gcmap_up <- dsea_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_gcmap_up)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## #   BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## #   Count <int>
#0 results
dtnetplot(drugs = drugs(hyperG_k_res_gcmap_up), set = "hsa04921", 
          desc="Oxytocin signaling pathway")
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs: 
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091

Fisher Method
Fisher’s exact test (Graham J. G. Upton 1992) can also be used to search a GS-DB (gene set database) for entries that are similar to a GS-Q (gene set query). In this case both the query and the database are composed of gene label sets, such as DEG sets.

qsig_fisher <- qSig(query = as.matrix(LOG_V5), gess_method = "Fisher", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
fisher <- gess_fisher(qSig = qsig_fisher, higher = 1, lower = -1)
result(fisher)[1:50,]
## # A tibble: 50 x 13
##    pert  PCID  cell  type  trend    pval   padj effect   LOR  nSet nFound signed
##    <chr> <chr> <chr> <chr> <chr>   <dbl>  <dbl>  <dbl> <dbl> <dbl>  <dbl> <lgl> 
##  1 SA-1… 5465… NKDBA trt_… over  2.54e-6 0.0127   4.71  1.62  1806     18 FALSE 
##  2 BRD-… 5664… PC3   trt_… over  3.01e-6 0.0150   4.67  1.54  2475     21 FALSE 
##  3 cycl… 1640… PHH   trt_… over  7.14e-6 0.0357   4.49  1.50  4468     28 FALSE 
##  4 QL-X… 7370… HS57… trt_… over  1.91e-5 0.0478   4.27  1.44  4657     28 FALSE 
##  5 K784… 4381… VCAP  trt_… under 6.59e-5 0.0843  -3.99 -1.32  9826     20 FALSE 
##  6 BRD-… 5333… PC3   trt_… over  6.74e-5 0.0843   3.99  1.71   732     10 FALSE 
##  7 BRD-… 7816… MCF7  trt_… over  5.28e-5 0.0843   4.04  1.34  3640     24 FALSE 
##  8 BRD-… 7632… MCF7  trt_… over  1.93e-5 0.0963   4.27  1.41  2530     20 FALSE 
##  9 BRD-… 5664… PC3   trt_… over  1.14e-4 0.114    3.86  2.21   247      6 FALSE 
## 10 HDAC… NotF… NEU   trt_… over  4.40e-4 0.122    3.52  1.26  1701     14 FALSE 
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>

The columns in the result table specific to the Fisher method include the following information. pval: p-value of the Fisher’s exact test; padj: p-value adjusted for multiple hypothesis testing using R’s p.adjust function with the Benjamini & Hochberg (BH) method; effect: z-score based on the standard normal distribution; LOR: log odds ratio.

write.csv2(result(fisher), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_res_human.csv")
fisher_res <- result(fisher)
drugs_top20_fisher <- c(unique(fisher_res$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher, col = "effect")

fisher_res_over <- fisher_res[fisher_res$trend=="over", ]
up_drug_fisher <- unique(fisher_res_over$pert)[1:20]
fisher_res_under <- fisher_res[fisher_res$trend=="under", ]
down_drug_fisher <- unique(fisher_res_under$pert)[1:20]
drugs_top20_fisher_over <- c(unique(fisher_res_over$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_over, col = "effect")

dup_rct_res_fisher_over <- tsea_dup_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs: 
## sa-1472623 / brd-k23980805 / brd-k30836161 / brd-k53932786 / brd-k50659736 / brd-k87371039 / hdac1-selective / isonicotinohydroxamic-acid / brd-k89097220 / cephalosporanic-acid / brd-k26767475 / brd-k86948282 / brd-k96704748
result(dup_rct_res_fisher_over)
## # A tibble: 98 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa04… Serotonerg… 17/57     115/81… 9.75e-19 1.51e-16 1.02e-16 773/77…    17
##  2 hsa04… Calcium si… 15/57     240/81… 5.49e-11 4.26e- 9 2.86e- 9 10105/…    15
##  3 hsa04… MAPK signa… 16/57     294/81… 8.94e-11 4.62e- 9 3.11e- 9 5530/5…    16
##  4 hsa04… Adrenergic… 12/57     150/81… 3.55e-10 1.38e- 8 9.26e- 9 147/14…    12
##  5 hsa04… Oxytocin s… 11/57     154/81… 7.03e- 9 2.18e- 7 1.46e- 7 5530/5…    11
##  6 hsa05… Chemical c… 8/57      69/8112 2.15e- 8 5.54e- 7 3.73e- 7 1577/1…     8
##  7 hsa04… Type II di… 7/57      46/8112 2.50e- 8 5.55e- 7 3.73e- 7 2475/7…     7
##  8 hsa05… Arrhythmog… 8/57      77/8112 5.17e- 8 1.00e- 6 6.73e- 7 775/77…     8
##  9 hsa04… Taste tran… 8/57      86/8112 1.24e- 7 2.11e- 6 1.42e- 6 773/77…     8
## 10 hsa04… Cardiac mu… 8/57      87/8112 1.36e- 7 2.11e- 6 1.42e- 6 775/77…     8
## # … with 88 more rows
write.csv2(result(dup_rct_res_fisher_over), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_tsea_over_human.csv")
hyperG_k_res_fisher_over <- dsea_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_fisher_over)
## # A tibble: 20 x 9
##    ID     Description   GeneRatio BgRatio  pvalue p.adjust qvalue itemID   Count
##    <chr>  <chr>         <chr>     <chr>     <dbl>    <dbl>  <dbl> <chr>    <int>
##  1 hsa05… Amyotrophic … 4/7       1174/2… 3.26e-4   0.0185 0.0117 cyclosp…     4
##  2 hsa05… Huntington d… 4/7       1185/2… 3.38e-4   0.0185 0.0117 cyclosp…     4
##  3 hsa05… Parkinson di… 4/7       1222/2… 3.80e-4   0.0185 0.0117 cyclosp…     4
##  4 hsa05… Prion disease 4/7       1382/2… 6.10e-4   0.0223 0.0141 cyclosp…     4
##  5 hsa00… Retinol meta… 3/7       732/20… 1.43e-3   0.0305 0.0194 cyclosp…     3
##  6 hsa05… Chemical car… 3/7       732/20… 1.43e-3   0.0305 0.0194 cyclosp…     3
##  7 hsa04… Long-term po… 3/7       777/20… 1.69e-3   0.0305 0.0194 cyclosp…     3
##  8 hsa02… ABC transpor… 2/7       199/20… 1.91e-3   0.0305 0.0194 cyclosp…     2
##  9 hsa00… Drug metabol… 3/7       853/20… 2.22e-3   0.0305 0.0194 cyclosp…     3
## 10 hsa00… Metabolism o… 3/7       869/20… 2.34e-3   0.0305 0.0194 cyclosp…     3
## 11 hsa00… Steroid horm… 3/7       906/20… 2.63e-3   0.0305 0.0194 cyclosp…     3
## 12 hsa05… Spinocerebel… 3/7       906/20… 2.63e-3   0.0305 0.0194 cyclosp…     3
## 13 hsa05… Amphetamine … 3/7       916/20… 2.72e-3   0.0305 0.0194 cyclosp…     3
## 14 hsa05… Herpes simpl… 3/7       1122/2… 4.85e-3   0.0491 0.0311 ql-x-13…     3
## 15 hsa04… Oxytocin sig… 3/7       1148/2… 5.17e-3   0.0491 0.0311 cyclosp…     3
## 16 hsa04… Synaptic ves… 2/7       357/20… 5.99e-3   0.0491 0.0311 verapam…     2
## 17 hsa04… Growth hormo… 3/7       1217/2… 6.10e-3   0.0491 0.0311 ql-x-13…     3
## 18 hsa04… Cellular sen… 3/7       1248/2… 6.55e-3   0.0491 0.0311 cyclosp…     3
## 19 hsa01… Endocrine re… 3/7       1259/2… 6.71e-3   0.0491 0.0311 cyclosp…     3
## 20 hsa04… Non-alcoholi… 3/7       1260/2… 6.72e-3   0.0491 0.0311 verapam…     3
drugs_top20_fisher_under <- c(unique(fisher_res_under$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_under, col = "effect")

dup_rct_res_fisher_under <- tsea_dup_hyperG(drugs = drugs_top20_fisher_under[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 15 drugs: 
## k784-3131 / oxetane / brd-k32111336 / brd-k65134192 / brd-k04722205 / bg-1021 / brd-k53932786 / ran-14b / brd-k66448556 / brd-k74606027 / gnf-2 / brd-k10806285 / ku-c103655 / brd-k41284916 / brd-k96194181
result(dup_rct_res_fisher_under)
## # A tibble: 182 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa04… Dopaminerg… 11/30     132/81… 5.84e-13 1.09e-10 4.30e-11 7054/1…    11
##  2 hsa04… Fc epsilon… 7/30      68/8112 3.71e- 9 2.83e- 7 1.12e- 7 207/24…     7
##  3 hsa04… Prolactin … 7/30      70/8112 4.56e- 9 2.83e- 7 1.12e- 7 7054/2…     7
##  4 hsa05… Yersinia i… 8/30      137/81… 2.31e- 8 1.07e- 6 4.25e- 7 207/29…     8
##  5 hsa00… Folate bio… 5/30      26/8112 3.04e- 8 1.13e- 6 4.48e- 7 5053/5…     5
##  6 hsa04… T cell rec… 7/30      104/81… 7.42e- 8 2.30e- 6 9.12e- 7 207/29…     7
##  7 hsa05… Toxoplasmo… 7/30      112/81… 1.24e- 7 3.30e- 6 1.31e- 6 4843/2…     7
##  8 hsa04… Sphingolip… 7/30      119/81… 1.89e- 7 3.90e- 6 1.54e- 6 207/56…     7
##  9 hsa04… Growth hor… 7/30      119/81… 1.89e- 7 3.90e- 6 1.54e- 6 207/29…     7
## 10 hsa04… Osteoclast… 7/30      128/81… 3.11e- 7 5.55e- 6 2.20e- 6 5468/2…     7
## # … with 172 more rows
qsig_sp <- qSig(query = as.matrix(LOG_V5), gess_method = "Cor", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
sp <- gess_cor(qSig=qsig_sp, method="spearman")
result(sp)[1:50,]
## # A tibble: 50 x 7
##    pert     PCID    cell  type  trend cor_score t_gn_sym                        
##    <chr>    <chr>   <chr> <chr> <chr>     <dbl> <chr>                           
##  1 galanta… 9651    HT29  trt_… down     -0.618 ACHE; BCHE; CHRNA1; CHRNA10; CH…
##  2 STK-064… 2878746 PC3   trt_… up        0.573 <NA>                            
##  3 BRD-K42… 546384… NPC   trt_… down     -0.552 <NA>                            
##  4 BRD-K56… 546665… PC3   trt_… down     -0.546 <NA>                            
##  5 BRD-K91… 445014… A375  trt_… down     -0.540 <NA>                            
##  6 BRD-K71… 107410  FIBR… trt_… down     -0.533 <NA>                            
##  7 salbuta… 2083    VCAP  trt_… up        0.530 ADCY10; ADRB1; ADRB2; ADRB3     
##  8 BRD-K48… 546501… NPC   trt_… up        0.529 <NA>                            
##  9 BRD-K90… 498358… VCAP  trt_… up        0.526 <NA>                            
## 10 BRD-K93… 6404603 A375  trt_… down     -0.523 <NA>                            
## # … with 40 more rows
sp_res <- result(sp)
write.csv2(result(sp), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
drugs_top20_spearman <- c(unique(sp_res$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_spearman, col = "cor_score")

sp_res_up <- sp_res[sp_res$trend == "up",]
up_drug_sp <- unique(sp_res_up$pert)[1:20]
sp_res_down <- sp_res[sp_res$trend == "down",]
down_drug_sp <- unique(sp_res_down$pert)[1:20]
drugs_top20_sp_down <- c(unique(sp_res_down$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_down, col = "cor_score")

dup_rct_res_sp_down <- tsea_dup_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 14 drugs: 
## brd-k42743267 / brd-k56450018 / brd-k91063002 / brd-k71339761 / brd-k93623754 / brd-k67306351 / brd-k81408913 / brd-k99874282 / brd-k59488055 / cercosporin / brd-k52099002 / brd-k28162668 / brd-k97522823 / brd-k84024786
result(dup_rct_res_sp_down)
## # A tibble: 51 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa04… Neuroactiv… 29/58     341/81… 3.75e-25 4.49e-23 3.63e-23 1134/5…    29
##  2 hsa05… Nicotine a… 9/58      40/8112 5.89e-12 3.53e-10 2.85e-10 1137/8…     9
##  3 hsa04… Insulin se… 11/58     86/8112 1.50e-11 4.61e-10 3.72e-10 3778/3…    11
##  4 hsa04… Cholinergi… 12/58     113/81… 1.54e-11 4.61e-10 3.72e-10 43/113…    12
##  5 hsa05… Chemical c… 12/58     212/81… 2.30e- 8 5.53e- 7 4.46e- 7 1136/1…    12
##  6 hsa04… cGMP-PKG s… 9/58      167/81… 2.40e- 6 4.80e- 5 3.87e- 5 4985/3…     9
##  7 hsa04… Leukocyte … 7/58      114/81… 1.50e- 5 2.46e- 4 1.99e- 4 1535/1…     7
##  8 hsa05… Leishmania… 6/58      77/8112 1.64e- 5 2.46e- 4 1.99e- 4 1535/1…     6
##  9 hsa05… Diabetic c… 8/58      203/81… 8.77e- 5 1.17e- 3 9.44e- 4 1535/1…     8
## 10 hsa05… Prion dise… 9/58      273/81… 1.22e- 4 1.46e- 3 1.18e- 3 1535/1…     9
## # … with 41 more rows
hyperG_k_res_sp_down<- dsea_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_down)
## # A tibble: 4 x 9
##   ID     Description  GeneRatio BgRatio  pvalue p.adjust  qvalue itemID    Count
##   <chr>  <chr>        <chr>     <chr>     <dbl>    <dbl>   <dbl> <chr>     <int>
## 1 hsa04… Pancreatic … 4/6       857/20… 4.25e-5  0.00472 0.00430 dextrome…     4
## 2 hsa04… cGMP-PKG si… 4/6       1509/2… 3.89e-4  0.0169  0.0154  dextrome…     4
## 3 hsa04… Insulin sec… 3/6       595/20… 4.56e-4  0.0169  0.0154  miconazo…     3
## 4 hsa04… Salivary se… 3/6       884/20… 1.45e-3  0.0402  0.0366  miconazo…     3
drugs_top20_sp_up <- c(unique(sp_res_up$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_up, col = "cor_score")

dup_rct_res_sp_up <- tsea_dup_hyperG(drugs = drugs_top20[1:20], type = "KEGG",
                               pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs: 
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / gingerol / brd-k42867405 / brd-k02641134 / mw-stk33-3a / brd-k64106162 / brd-k12393722
result(dup_rct_res_sp_up)
## # A tibble: 196 x 9
##    ID     Description GeneRatio BgRatio   pvalue p.adjust   qvalue itemID  Count
##    <chr>  <chr>       <chr>     <chr>      <dbl>    <dbl>    <dbl> <chr>   <int>
##  1 hsa05… PD-L1 expr… 12/50     89/8112 1.23e-13 2.49e-11 8.39e-12 7535/3…    12
##  2 hsa05… Yersinia i… 11/50     137/81… 4.53e-10 4.60e- 8 1.55e- 8 2185/7…    11
##  3 hsa05… Hepatitis B 11/50     162/81… 2.74e- 9 1.85e- 7 6.24e- 8 1017/2…    11
##  4 hsa04… Prolactin … 8/50      70/8112 8.21e- 9 4.17e- 7 1.40e- 7 3717/2…     8
##  5 hsa04… T cell rec… 9/50      104/81… 1.07e- 8 4.34e- 7 1.46e- 7 7535/2…     9
##  6 hsa04… Growth hor… 9/50      119/81… 3.51e- 8 1.19e- 6 4.00e- 7 3717/2…     9
##  7 hsa05… Human immu… 11/50     212/81… 4.58e- 8 1.33e- 6 4.48e- 7 983/21…    11
##  8 hsa05… Lipid and … 11/50     215/81… 5.29e- 8 1.34e- 6 4.53e- 7 1559/3…    11
##  9 hsa04… Osteoclast… 9/50      128/81… 6.63e- 8 1.48e- 6 5.00e- 7 7297/2…     9
## 10 hsa04… Th1 and Th… 8/50      92/8112 7.31e- 8 1.48e- 6 5.00e- 7 7535/3…     8
## # … with 186 more rows
write.csv2(result(dup_rct_res_sp_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
hyperG_k_res_sp_up <- dsea_hyperG(drugs = drugs_top20_sp_up[1:20], type = "KEGG", 
                            pvalueCutoff = 0.05 , qvalueCutoff = 0.05, 
                            minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_up)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## #   BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## #   Count <int>
intersect(up_drug_cmap, up_drug_lincs)
## character(0)
intersect(up_drug_cmap, up_drug_gcmap)
## character(0)
intersect(up_drug_cmap, up_drug_sp)
## character(0)
intersect(up_drug_cmap, up_drug_fisher)
## character(0)
intersect(up_drug_lincs, up_drug_gcmap)
## character(0)
intersect(up_drug_lincs, up_drug_sp)
## character(0)
intersect(up_drug_lincs, up_drug_fisher)
## character(0)
intersect(up_drug_gcmap, up_drug_sp)
## character(0)
intersect(up_drug_gcmap, up_drug_fisher)
## character(0)
intersect(up_drug_sp, up_drug_fisher)
## [1] "HDAC1-selective"

END

Location of final scripts: Locally /Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/

Operations complete: 210604

Versions sessionInfo()